betAS command-line interface (CLI) tutorial

Mariana Ascensão-Ferreira

2023-10-11

betAS is a user-friendly R package that allows intuitive analysis and visualisation of differential alternative splicing (AS) based on beta distributions.

Beta distributions are suitable to quantify inclusion proportions of alternative sequences, using RNA sequencing reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the inclusion proportion as mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. betAS uses beta distributions to accurately model PSI values and their precision, to quantitatively and visually compare AS between groups of samples.

betAS allows the analysis of user-provided tables with AS quantifications, such as those obtained by vast-tools, rMATS or Whippet, ranking differentially spliced events by a significance metric that incorporates the compromise between the uncertainty in individual sample estimates and the variability among replicates.

Please note that this tutorial will provide an overview on how to use betAS R package for alternative splicing analyses. For a more detailed explanation on the methods used, please check betAS pre-print.

1 Installing and starting the program

Install betAS by typing the following in an R console:

devtools::install_github("marianaferreira/betAS")

After the installation, load betAS and other required packages by typing:

#devtools::load_all() # delete this later, useful to test
library(betAS)
# For visualization
library(ggplot2)
library(plotly) 
library(dplyr)
library(ggpubr)

2 Loading alternative splicing quantification data

betAS accepts files sourced from vast-tools, rMATS, or whippet. These files should contain junction reads information (e.g. results obtained from the vast-tools’ tidy module are not supported).

Based on the value assigned to the tool parameter, the pathTables parameter can take the following input types:

# Example to load data from vast-tools
dataset <- getDataset(pathTables="path/to/dataset/*INCLUSION_LEVELS_FULL*.tab", tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables="path/to/dataset/*MATS.JC.txt", tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=list("path/to/sample1/*.psi.gz",
                                      "path/to/sample2/*.psi.gz",
                                      "path/to/sample3/*.psi.gz"), tool="whippet")

For users who may not have files to upload, there’s also an option to work with our built-in dataset. This dataset is a subset of the Deep transcriptional profiling of longitudinal changes during neurogenesis and network maturation in vivo public mouse dataset:

Run Cell Type Div
SRR645824 ESC DIV-8
SRR645826 ESC DIV-8
SRR645828 ESC DIV-8
SRR645830 ESC DIV-8
SRR645872 Neuron DIV28
SRR645875 Neuron DIV28
SRR645877 Neuron DIV28
SRR645879 Neuron DIV28

To use the pre-built datasets, simply don’t specify the file and choose the tool you prefer.

# Example to load data from vast-tools
dataset <- getDataset(pathTables=NULL, tool="vast-tools")

# Example to load data from rMATS
dataset <- getDataset(pathTables=NULL, tool="rMATS")

# Example to load data from whippet
dataset <- getDataset(pathTables=NULL, tool="whippet")

3 Filtering datasets

betAS allows to filter the dataset based on alternative splicing event types, PSI value range to consider and minimum number of supporting reads.

For simplicity, in this tutorial we will be using betAS provided dataset from vast-tools to illustrate such features. The getDataset function retrieves the inclusion tables specific to the chosen tool for further analyses, and the getEvents formats inclusion tables into a standard format that is compatible with betAS.

The output from getEvents is an object with 4 attributes:

tool <-  "vast-tools"

# Import data
dataset <- getDataset(pathTables=NULL, tool = tool)
# Get data ready for betAS
dataset <- getEvents(dataset, tool = tool)

lapply(dataset,head)
## $PSI
##     GENE        EVENT                     COORD LENGTH                                                      FullCO COMPLEX SRR645824 SRR645826 SRR645828 SRR645830 SRR645872 SRR645875
## 2  Cdc45 MmuEX6061348   chr16:18795095-18795226    132 chr16:18795825+18795831,18795095-18795226,18794860+18794874       S    100.00    100.00    100.00    100.00     100.0    100.00
## 7   Narf MmuEX0030885 chr11:121250293-121250430    138               chr11:121249192,121250293-121250430,121250760       S    100.00    100.00    100.00    100.00     100.0    100.00
## 9  Scmh1 MmuEX0041239  chr4:120524998-120525063     66                chr4:120522649,120524998-120525063,120528156       S    100.00    100.00     92.86    100.00     100.0     94.74
## 11  Tfe3 MmuEX0046712      chrX:7767893-7767997    105                        chrX:7767684,7767893-7767997,7769430       S     93.62     89.36     92.16     87.34      81.4    100.00
## 12 Axin2 MmuEX0007157 chr11:108942935-108943129    195               chr11:108942696,108942935-108943129,108943656       S    100.00    100.00    100.00    100.00     100.0    100.00
## 13 Axin2 MmuEX0007154 chr11:108944510-108944605     96               chr11:108943886,108944510-108944605,108945880       S    100.00    100.00    100.00    100.00     100.0    100.00
##    SRR645877 SRR645879
## 2     100.00    100.00
## 7     100.00    100.00
## 9     100.00     95.35
## 11     94.74     90.24
## 12    100.00    100.00
## 13    100.00    100.00
## 
## $Qual
##     GENE        EVENT                     COORD LENGTH                                                      FullCO COMPLEX                       SRR645824.Q
## 2  Cdc45 MmuEX6061348   chr16:18795095-18795226    132 chr16:18795825+18795831,18795095-18795226,18794860+18794874       S SOK,SOK,92=102=0,OK,S@194.00,0.00
## 7   Narf MmuEX0030885 chr11:121250293-121250430    138               chr11:121249192,121250293-121250430,121250760       S     OK,OK,19=30=0,OK,S@49.00,0.00
## 9  Scmh1 MmuEX0041239  chr4:120524998-120525063     66                chr4:120522649,120524998-120525063,120528156       S  VLOW,VLOW,9=13=0,OK,S@22.00,0.00
## 11  Tfe3 MmuEX0046712      chrX:7767893-7767997    105                        chrX:7767684,7767893-7767997,7769430       S     OK,OK,36=52=3,OK,S@85.19,5.81
## 12 Axin2 MmuEX0007157 chr11:108942935-108943129    195               chr11:108942696,108942935-108943129,108943656       S         N,N,6=8=0,OK,S@14.00,0.00
## 13 Axin2 MmuEX0007154 chr11:108944510-108944605     96               chr11:108943886,108944510-108944605,108945880       S          N,N,5=3=0,OK,S@8.00,0.00
##                           SRR645826.Q                       SRR645828.Q                      SRR645830.Q                      SRR645872.Q                      SRR645875.Q
## 2  SOK,SOK,102=102=0,OK,S@204.00,0.00  SOK,SOK,94=98=0,OK,S@192.00,0.00 SOK,SOK,75=70=0,OK,S@145.00,0.00         N,N,1=2=0,OK,S@3.00,0.00         N,N,2=3=0,OK,S@5.00,0.00
## 7       OK,OK,26=18=0,OK,S@44.00,0.00     OK,OK,26=19=0,OK,S@45.00,0.00    OK,OK,20=20=0,OK,S@40.00,0.00 SOK,SOK,66=68=0,OK,S@134.00,0.00 SOK,SOK,66=54=0,OK,S@120.00,0.00
## 9     LOW,LOW,13=15=0,OK,S@28.00,0.00 VLOW,VLOW,12=14=1,OK,S@25.07,1.93 VLOW,VLOW,12=7=0,OK,S@19.00,0.00  LOW,LOW,12=24=0,OK,S@36.00,0.00  LOW,LOW,17=19=1,OK,S@35.05,1.95
## 11      OK,OK,31=53=5,OK,S@79.53,9.47     OK,OK,44=50=4,OK,S@90.32,7.68    OK,OK,35=34=5,OK,S@64.63,9.37  LOW,LOW,14=21=4,OK,S@31.75,7.25    OK,OK,22=18=0,OK,S@40.00,0.00
## 12           N,N,1=8=0,B2,S@9.00,0.00        N,N,2=13=0,B2,S@15.00,0.00  LOW,LOW,11=16=0,OK,S@27.00,0.00         N,N,4=4=0,OK,S@8.00,0.00         N,N,3=5=0,OK,S@8.00,0.00
## 13           N,N,5=2=0,B1,S@7.00,0.00  VLOW,VLOW,7=10=0,OK,S@17.00,0.00 VLOW,VLOW,12=5=0,B1,S@17.00,0.00        N,N,9=3=0,B1,S@12.00,0.00        N,N,8=4=0,OK,S@12.00,0.00
##                         SRR645877.Q                      SRR645879.Q
## 2          N,N,2=1=0,OK,S@3.00,0.00         N,N,1=2=0,OK,S@3.00,0.00
## 7  SOK,SOK,73=70=0,OK,S@143.00,0.00 SOK,SOK,90=63=0,OK,S@153.00,0.00
## 9     OK,OK,15=24=0,OK,S@39.00,0.00  LOW,LOW,27=14=1,OK,S@40.05,1.95
## 11 VLOW,VLOW,9=27=1,B1,S@35.05,1.95    OK,OK,16=21=2,OK,S@35.19,3.81
## 12         N,N,5=4=0,OK,S@9.00,0.00        N,N,4=8=0,OK,S@12.00,0.00
## 13        N,N,7=5=0,OK,S@12.00,0.00       N,N,10=3=0,B1,S@13.00,0.00
## 
## $EventsPerType
## 
##  Alt3  Alt5   ANN    C1    C2    C3 
## 85993 61115 24750  5699  5179  5896 
## 
## $Samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830" "SRR645872" "SRR645875"

We will be using betAS metadata for vast-tools built-in example to ease group creation.

data("VT2_metadata_mouse")
(metadata <- as.data.frame(VT2_metadata_mouse))
Run CellType Div
SRR645824 ESC DIV-8
SRR645826 ESC DIV-8
SRR645828 ESC DIV-8
SRR645830 ESC DIV-8
SRR645872 Neuron DIV28
SRR645875 Neuron DIV28
SRR645877 Neuron DIV28
SRR645879 Neuron DIV28
# to clean-up the environment of unecessary variables, loaded automatically by betAS when using default datasets
rm(VT2_data_mouse)
rm(VT2_metadata_mouse)

3.1 Filter dataset based on alternative splicing event types and minimum number of supporting reads

betAS allows to filter event types. The supported event types are in accordance to the ones documented in the alternative splicing quantification tools supported by betAS:

betAS also allows to filter events with less than a given number of junction reads (sum of inclusion and exclusion reads) in at least one sample. To do so, the user can specify the minimum number of reads to consider in the Nparameter.

To simplify, we will be considering only exon skipping events (C1, C2, C3, S & MIC events from vast-tools) with at least 10 junction read counts in all samples.

dataset_filtered <- filterEvents(dataset, types=c("C1", "C2", "C3", "S", "MIC"), N=10)

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 195636

3.2 Filter dataset based on PSI value range

betAS allows to filter events based on their PSI value range. We will be filtering events with a PSI of less than 1 or higher than 99 in all samples, in order to keep only alternative events.

dataset_filtered <- alternativeEvents(dataset_filtered, minPsi=1, maxPsi=99  )

cat(paste0("Alternative events: ", nrow(dataset_filtered$PSI)))
## Alternative events: 1404

3.3 Plotting PSI distribution across samples

After filtering the data to only keep events of interest, we can have an overview of the PSIs on the samples being analysed.

bigPicturePlot <- bigPicturePlot(table = dataset_filtered$PSI)
bigPicturePlot + theme_minimal()

4 Group definition and management from automatically detected or user-defined sample list

To allow for alternative splicing analyses between groups of interest, the user must firstly specify to what groups the samples under analyses belong to.

To check the names of the samples being used, we can access the Samples attribute inside the dataset object.

cat("Sample names: ")
## Sample names:
cat(dataset_filtered$Samples)
## SRR645824 SRR645826 SRR645828 SRR645830 SRR645872 SRR645875 SRR645877 SRR645879

In order to ease group visualization, we can assign a color to each group. In this section we showcase how to assign a group to a given color, based on a manually defined palette. To automatically define color palettes of N colors the user might want to check Custom color palette for group definition section of this tutorial.

4.1 Automatic group definition based on sample name

We can define sample groups based on the name of the samples being used, assuming that the sample naming has some characteristics that allow to easily identify the group to which they belong to.

samples <- dataset_filtered$Samples 
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7", "#C7CEEA", "#FBE2FD", "#D9ECFE")
# Automatic group definition
not_grouped   <- samples
checked       <- c()
groups        <- LETTERS[seq(1, length = min(length(names), 26))]
used_groups   <- groups
groupList <- list()

while((length(checked) < length(names)) & length(used_groups) <= 26){
  
  groupNames  <- agrep(pattern = not_grouped[1], x = not_grouped, value = TRUE)
  checked     <- c(checked, groupNames)
  not_grouped <- not_grouped[-c(match(groupNames, not_grouped))]
  
  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = used_groups[1],
                                                   samples = groupNames,
                                                   color = random_colors[1])
  names(groupList) <- make.unique(c(currentNames, used_groups[1]))
  
  random_colors <- random_colors[-1]
  used_groups   <- used_groups[-1]
  
}

groupList
## $A
## $A$name
## [1] "A"
## 
## $A$samples
## [1] "SRR645824" "SRR645826" "SRR645828"
## 
## $A$color
## [1] "#FF9AA2"
## 
## 
## $B
## $B$name
## [1] "B"
## 
## $B$samples
## [1] "SRR645830"
## 
## $B$color
## [1] "#FFB7B2"
## 
## 
## $C
## $C$name
## [1] "C"
## 
## $C$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
## 
## $C$color
## [1] "#FFDAC1"
# Visualize colors being used
slices <- rep(1, length(groupList))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groupList)], border = "black", labels=names(groupList), main = "Color palette for group definition")

4.2 Group definition based on user-defined list

We can also define groups of samples based on a given column of the metadata. For this example we will use the cell type as a grouping variable, i.e. Neurons or Embrionic Stem Cell (ESC). This information can be accessed through the CellType column in the example metadata. The name of the samples to be used can be retrieved by accessing the Run column in the example metadata.

If using user-provided data, the user must also provide a metadata table, or ensure that the final result corresponds to groupList presented in this section.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "CellType"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2")
groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList
## $ESC
## $ESC$name
## [1] "ESC"
## 
## $ESC$samples
## [1] "SRR645824" "SRR645826" "SRR645828" "SRR645830"
## 
## $ESC$color
## [1] "#FF9AA2"
## 
## 
## $Neuron
## $Neuron$name
## [1] "Neuron"
## 
## $Neuron$samples
## [1] "SRR645872" "SRR645875" "SRR645877" "SRR645879"
## 
## $Neuron$color
## [1] "#FFB7B2"
# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

5 Differential alternative splicing between two groups

betAS approach for alternative splicing analyses takes into account two metrics for each alternative splicing event under analyses: the magnitude of the effect (𝚫PSI = PSIbetAS(group A) - PSIbetAS(group B)) and the statistical significance of such changes. For the latter, betAS introduces three approaches: probability of differential splicing (Pdiff), F-statistic or false discovery rate (FDR).

5.1 Calculate statistical metrics for differential alternative splicing & visualize results

Firstly, we need to define the groups we will be analysing. This step is particularly important when we have more than one group defined, but want to compare only two of them.

# Define groups
groupA    <- "Neuron"
groupB    <- "ESC"

# Define samples inside each group
samplesA    <- groupList[[groupA]]$samples
samplesB    <- groupList[[groupB]]$samples

# Convert samples into indexes
colsGroupA    <- convertCols(dataset_filtered$PSI, samplesA)
colsGroupB    <- convertCols(dataset_filtered$PSI, samplesB)


cat("samplesA:")
## samplesA:
cat(samplesA) 
## SRR645872 SRR645875 SRR645877 SRR645879
cat("\nsamplesB:")
## 
## samplesB:
cat(samplesB)
## SRR645824 SRR645826 SRR645828 SRR645830
cat("\n")
cat("colsGroupA: \n")
## colsGroupA:
colsGroupA
## SRR645872 SRR645875 SRR645877 SRR645879 
##        11        12        13        14
cat("colsGroupB:  \n")
## colsGroupB:
colsGroupB
## SRR645824 SRR645826 SRR645828 SRR645830 
##         7         8         9        10

5.1.1 Probability of differential splicing (Pdiff)

To calculate the Pdiff for all events in our dataset, we will use the prepareTableVolcano function. To visualize the results in a volcano plot, we will use the plotvolcanoPdiff function.

volcanoTable_Pdiff <- prepareTableVolcano(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100)

volcano_Pdiff <- plotVolcano(betasTable = volcanoTable_Pdiff,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Pdiff
## Warning: ggrepel: 743 unlabeled data points (too many overlaps). Consider increasing max.overlaps

The above plot colors in pink and labels by default the events with a ∆PSI>0.1. However, if these are in high number, the labels might not appear. To make the plot interactive, we can use the ggplotly function.

# change size points manually
volcano_Pdiff$layers[[1]]$aes_params$size <- 1
volcano_Pdiff$layers[[2]]$aes_params$size <- 1

# Plot interactively
plotly_volcano_Pdiff <- ggplotly(volcano_Pdiff, width = 700, height = 500) %>%
                          layout(
                            font = list(size = 10),
                            xaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            ),
                            yaxis = list(
                              title = list(font = list(size = 14)),  # Adjust as necessary
                              tickfont = list(size = 10)  # Adjust the tick font size here
                            )
                          )
## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
hover_text  <- paste(
  "Event: ", volcanoTable_Pdiff$EVENT,
  "<br>Pdiff: ", round(volcanoTable_Pdiff$Pdiff,3),
  "<br>Deltapsi: ", round(volcanoTable_Pdiff$deltapsi,3),
  sep = ""
)

 

plotly_volcano_Pdiff$x$data[[1]]$text <- hover_text # all points
plotly_volcano_Pdiff$x$data[[2]]$text <- NULL 
  
plotly_volcano_Pdiff

The results can also be manually inspected by browsing the volcanoTable_Pdiff table

head(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")])
GENE EVENT COORD Pdiff deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 0.9805 0.1393190
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 0.9150 -0.0223852
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 0.8840 -0.0446254
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 1.0000 0.6235921
161 Chtop MmuEX0001048 chr3:90505395-90505478 0.5765 0.0043185
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 1.0000 -0.5774750

5.1.2 F-statistic

To calculate the F-statistic for all events in our dataset, we will use the prepareTableVolcanoFstat function. To visualize the results in a volcano plot, we will use the plotVolcanoFstat function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_Fstat <- prepareTableVolcanoFstat(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100)

volcano_Fstat <- plotVolcanoFstat(betasTable = volcanoTable_Fstat,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_Fstat
## Warning: ggrepel: 719 unlabeled data points (too many overlaps). Consider increasing max.overlaps

head(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")])
GENE EVENT COORD Fstat deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 4.318198 0.1384826
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 2.642154 -0.0230877
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 2.101543 -0.0462705
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 9.648136 0.6219448
161 Chtop MmuEX0001048 chr3:90505395-90505478 1.018227 0.0034872
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 10.510154 -0.5784486

5.1.3 False discovery rate (FDR)

To calculate the FDR for all events in our dataset, we will use the prepareTableVolcanoFDR function. To visualize the results in a volcano plot, we will use the plotVolcanoFDR function. The user can also use the approach presented above to make the plot interactive.

volcanoTable_FDR <- prepareTableVolcanoFDR(psitable = dataset_filtered$PSI,
                                    qualtable = dataset_filtered$Qual,
                                    npoints = 500,
                                    colsA = colsGroupA,
                                    colsB = colsGroupB,
                                    labA = groupA,
                                    labB = groupB,
                                    basalColor = "#89C0AE",
                                    interestColor = "#E69A9C",
                                    maxDevTable = maxDevSimulationN100,
                                    nsim = 100) 
        
volcano_FDR <- plotVolcanoFDR(betasTable = volcanoTable_FDR,
                            labA = groupA,
                            labB = groupB,
                            basalColor = "#89C0AE",
                            interestColor = "#E69A9C") 

volcano_FDR
## Warning: ggrepel: 734 unlabeled data points (too many overlaps). Consider increasing max.overlaps

head(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")])
GENE EVENT COORD FDR deltapsi
37 Gmpr MmuEX0021295 chr13:45521127-45521219 0.02 0.1379942
98 Hnrnpd MmuEX0023099 chr5:99962134-99962204 0.04 -0.0221179
99 Hnrnpd MmuEX0023100 chr5:99963731-99963877 0.16 -0.0446755
144 Fmr1 MmuEX0019414 chrX:68705861-68705923 0.00 0.6253615
161 Chtop MmuEX0001048 chr3:90505395-90505478 0.45 0.0047099
175 Cnnm3 MmuEX0011891 chr1:36524034-36524132 0.00 -0.5811047

5.2 Visualize individual events

To illustrate betAS approach for an individual event with differential alternative splicing between Neurons and ESC, we will be using the event MmuEX0003638. This is an exon skipping event in the Actn4 gene.

eventID <- "MmuEX0003638"

We can visualize the beta distributions for this event for each sample.

# Auxiliary data to add increment based on the event coverage to avoid dividing by zero 
data("maxDevSimulationN100")

tdensities <- plotIndividualDensitiesList(eventID = eventID,
                                          npoints = 500,
                                          psitable = dataset$PSI,
                                          qualtable = dataset$Qual,
                                          groupList = groupList,
                                          maxDevTable = maxDevSimulationN100)

tdensities + theme_minimal() + ggtitle(eventID)
## Picking joint bandwidth of 0.0033
## Picking joint bandwidth of 0.0135

We can also inspect manually the different statistics for this event:

subset(volcanoTable_Pdiff[,c("GENE","EVENT","COORD","Pdiff","deltapsi")], EVENT==eventID)
GENE EVENT COORD Pdiff deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 1 -0.616764
subset(volcanoTable_Fstat[,c("GENE","EVENT","COORD","Fstat","deltapsi")], EVENT==eventID)
GENE EVENT COORD Fstat deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 11.54205 -0.6136195
subset(volcanoTable_FDR[,c("GENE","EVENT","COORD","FDR","deltapsi")], EVENT==eventID)
GENE EVENT COORD FDR deltapsi
45340 Actn4 MmuEX0003638 chr7:28895121-28895180 0 -0.6163307
plotPdiff <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000) %>% 
  plotPDiffFromEventObjList()

plotFstat <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000) %>% 
  plotFstatFromEventObjList()

plotFDR <- prepareTableEvent(eventID = eventID,
                               psitable = dataset$PSI,
                               qualtable = dataset$Qual,
                               npoints = 500,
                               colsA = colsGroupA,
                               colsB = colsGroupB,
                               labA = groupA,
                               labB = groupB,
                               basalColor = "#89C0AE",
                               interestColor = "#E69A9C",
                               maxDevTable = maxDevSimulationN100,
                               nsim = 1000) %>% 
  plotFDRFromEventObjList()

ggarrange(plotPdiff,plotFstat,plotFDR, ncol=3)

6 Differential alternative splicing between multiple groups

The differential AS approach implemented by betAS can be applied to multiple (i.e., more than two) groups in a novel ANOVA-inspired way that extends the Pdiff definition to the comparison of the differences between samples belonging to different biological conditions to those found between replicates. To illustrate this, we applied multiple-group betAS to the analysis of AS differences in a subset of human transcriptomes of forebrain, hindbrain, heart, kidney, liver and testis. This dataset is a subset of the Human RNA-seq time-series of the development of seven major organs public human dataset.

# load data and prepare it for betAS
data("VT1_data_human")
dataset_multgroups <- getEvents(VT1_data_human, tool = tool)

# load metadata
data("VT1_metadata_human")
metadata_multgroups <- as.data.frame(VT1_metadata_human)
metadata_multgroups
Run age developmental_stage organism_part sex
ERR2598266 14 adolescent forebrain male
ERR2598267 16 adolescent forebrain male
ERR2598268 17 adolescent forebrain male
ERR2598269 13 adolescent forebrain male
ERR2598270 13 adolescent heart male
ERR2598271 13 adolescent hindbrain male
ERR2598272 16 adolescent hindbrain male
ERR2598273 14 adolescent hindbrain male
ERR2598274 17 adolescent hindbrain male
ERR2598275 17 adolescent liver male
ERR2598276 16 adolescent testis male
ERR2598277 17 adolescent testis male
ERR2598278 14 adolescent testis male
ERR2598279 14 adolescent testis male
ERR2598280 58 elderly forebrain male
ERR2598281 58 elderly forebrain male
ERR2598282 58 elderly hindbrain male
ERR2598283 58 elderly hindbrain male
ERR2598284 58 elderly liver male
ERR2598285 58 elderly liver male
ERR2598286 55 elderly testis male
ERR2598305 50 middle adult forebrain male
ERR2598306 53 middle adult forebrain male
ERR2598307 54 middle adult heart male
ERR2598308 53 middle adult hindbrain male
ERR2598309 50 middle adult hindbrain male
ERR2598310 55 middle adult liver male
ERR2598311 46 middle adult testis male
ERR2598346 29 young adult forebrain male
ERR2598347 29 young adult forebrain male
ERR2598348 28 young adult forebrain male
ERR2598349 39 young adult forebrain male
ERR2598350 39 young adult forebrain male
ERR2598351 25 young adult heart male
ERR2598352 29 young adult hindbrain male
ERR2598353 29 young adult hindbrain male
ERR2598354 32 young adult hindbrain female
ERR2598355 39 young adult hindbrain male
ERR2598356 39 young adult hindbrain male
ERR2598357 29 young adult liver male
ERR2598358 39 young adult liver male
ERR2598359 39 young adult liver male
ERR2598360 29 young adult testis male
ERR2598361 29 young adult testis male
ERR2598362 28 young adult testis male
ERR2598363 39 young adult testis male

6.1 Group definition

We will create groups based on the samples’ tissue of origin, encoded in the organism_part metadata’s column.

# Define variable of the metadata table to be used as a grouping variable
groupingVariable <- "organism_part"
# to get the unique values of the grouping variable. In this example, will be Neuron and ESC
groups <- unique(metadata_multgroups[,groupingVariable])
# Define vector of sample names based on the example metadata 
samples <- metadata_multgroups$Run
# Define colors for the two groups
random_colors <- c("#FF9AA2", "#FFB7B2", "#FFDAC1", "#E2F0CB", "#B5EAD7")
groupList <- list()

for(i in 1:length(groups)){

  groupNames <- samples[which(metadata_multgroups[,groupingVariable] == groups[i])]

  # Assign new group
  currentNames <- names(groupList)
  groupList[[length(groupList)+1]] <- list(name = groups[i],
                                           samples = groupNames,
                                           color = random_colors[i])
  names(groupList) <- make.unique(c(currentNames, groups[i]))

}

groupList
## $forebrain
## $forebrain$name
## [1] "forebrain"
## 
## $forebrain$samples
##  [1] "ERR2598266" "ERR2598267" "ERR2598268" "ERR2598269" "ERR2598280" "ERR2598281" "ERR2598305" "ERR2598306" "ERR2598346" "ERR2598347" "ERR2598348" "ERR2598349" "ERR2598350"
## 
## $forebrain$color
## [1] "#FF9AA2"
## 
## 
## $heart
## $heart$name
## [1] "heart"
## 
## $heart$samples
## [1] "ERR2598270" "ERR2598307" "ERR2598351"
## 
## $heart$color
## [1] "#FFB7B2"
## 
## 
## $hindbrain
## $hindbrain$name
## [1] "hindbrain"
## 
## $hindbrain$samples
##  [1] "ERR2598271" "ERR2598272" "ERR2598273" "ERR2598274" "ERR2598282" "ERR2598283" "ERR2598308" "ERR2598309" "ERR2598352" "ERR2598353" "ERR2598354" "ERR2598355" "ERR2598356"
## 
## $hindbrain$color
## [1] "#FFDAC1"
## 
## 
## $liver
## $liver$name
## [1] "liver"
## 
## $liver$samples
## [1] "ERR2598275" "ERR2598284" "ERR2598285" "ERR2598310" "ERR2598357" "ERR2598358" "ERR2598359"
## 
## $liver$color
## [1] "#E2F0CB"
## 
## 
## $testis
## $testis$name
## [1] "testis"
## 
## $testis$samples
##  [1] "ERR2598276" "ERR2598277" "ERR2598278" "ERR2598279" "ERR2598286" "ERR2598311" "ERR2598360" "ERR2598361" "ERR2598362" "ERR2598363"
## 
## $testis$color
## [1] "#B5EAD7"
# Visualize colors being used
slices <- rep(1, length(groups))  # Equal-sized slices for each color
# Display the pie chart with colors
pie(slices, col = random_colors[1:length(groups)], border = "black", labels=groups, main = "Color palette for group definition")

7 Differential alternative splicing across multiple groups/conditions

7.1 Calculate statistical metrics for differential alternative splicing & visualize results

7.1.1 Metric 1

7.1.2 Metric 2

7.2 Visualize individual events

7.3 The monotony coefficient between multiple sequential groups

8 Supplementary Information

8.1 Custom color palette for group definition

In the provided example, the color palette used for group definition was manually defined. If the user wishes to automatically generate a palette of Ncolors, the following code is advisable:

#install.packages("grDevices")
library(grDevices)

generate_pastel_colors <- function(N) {
  pastel_colors <- vector("list", length = N)
  
  for (i in 1:N) {
    # Generate random values for hue, chroma, and luminance
    hue <- runif(1, min = 0, max = 360)  # Adjust the range for a different initial hue
    chroma <- runif(1, min = 50, max = 80)  # Adjust these values for your desired pastel range
    luminance <- runif(1, min = 80, max = 90)  # Adjust these values for your desired pastel range
    
    # Create a pastel color using hcl() with the random values
    pastel_color <- hcl(hue, chroma, luminance)
    
    pastel_colors[[i]] <- pastel_color
  }
  
  return(pastel_colors)
}


# Generate 10 random pastel colors
N <- 10
random_pastel_colors <- generate_pastel_colors(N)

cat("HEX codes of generated pastel colors: ")
## HEX codes of generated pastel colors:
cat(unlist(random_pastel_colors))
## #FFC2DF #B5F07A #B1EB9D #BAE696 #85E6A7 #FFCBA1 #FFB4DA #8FD2FF #55EABF #8FCFFF
# Prepare data for the pie chart
slices <- rep(1, N)  # Equal-sized slices for each color
# Display the pie chart with pastel colors
pie(slices, col = unlist(random_pastel_colors), border = "black", labels=unlist(random_pastel_colors), main = "Automatically generated color palette for group definition")